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Abstract 

In the present paper, we propose and analyze a novel method for estimating a 
univariate regression function of bounded variation. The underpinning idea is to 
combine two classical tools in nonparametric statistics, namely isotonic regression 
and the estimation of additive models. A geometrical interpretation enables us 
to link this iterative method with Von Neumann's algorithm. Moreover, making 
a connection with the general property of isotonicity of projection onto convex 
cones, we derive another equivalent algorithm and go further in the analysis. As 
iterating the algorithm leads to overfitting, several practical stopping criteria are 
also presented and discussed. 

Index Terms — Nonparametric estimation, Isotonic regression. Additive models, 
Metric projection on convex cones. 

2010 Mathematics Subject Classification: 52A20, 62G08, 90C33. 

1 Introduction 

In a statistical setting, consider the nonparametric regression model 

Y = r{X)+e 

^Corresponding author. 
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where X and Y are both real- valued random variables with X uniform in [0, 1], E [Y"^] < oo 
and E [s\X] = (see for example [IS])- Assume, in addition, that the regression function r 
is right-continuous and of bounded variation. With this respect, the Jordan decomposition 
asserts that such a function can be written as the sum of a non-decreasing function u and 
a non-increasing function b 

r(x) = u{x) + b{x). 

Viewing this latter equation as an additive model involving the increasing part and the 
decreasing part of r, we propose a new estimator which combines two well-established 
tools in nonparametric regression: the isotonic regression related to the estimation of 
monotone functions, and the backfitting algorithm devoted to the estimation of additive 
models. 

The estimation of a monotone regression function dates back to the 50's and the early 
work by Ayer et al. [2]. Given a sample of independent and identically distributed 
(i.i.d.) random couples (Xi, Fi), . . . , (X„,y„) following the general model ([1]), denoting 
Xi = < . . . < Xn = X(^n) the ordered sample, and yi,...,yn the corresponding 

observations, the Pool- Adjacent- Violators Algorithm (PAVA) determines a collection of 
non- decreasing level sets solution to the minimization problem 

n 

min V {yi - mf . (2) 

ltl<...<M„ — ' 

i=l 

Since the cone 

C+ = {u = (m, . . . , u„) G : ni < . . . < n„} (3) 

is a closed convex set in M", there exists a unique solution to this minimization problem. 
This solution, called the isotonic regression of y and denoted iso(y), is the metric projection 
oi y = (yi, . . . , yn) on with respect to the Euclidean norm, that is 

n 

\so{y) = arg min \\y — = arg min [yi — Uif . (4) 

MgC+ U&C+ ^ — ^ 

i=l 

Correspondingly, the antitonic regression of y is the projection of y on the set of vectors 
with non- increasing coordinates, that is 

n 

antif?/) = arg min \\y — 6|p = arg min iyi — hif 
b&c- b£C- ^-^ 

i=l 

where C" = -C+ = {6 = . . . , 6„) G M'^ : fei > . . . > 6„,}. From now on, C+ and C" will 
be called monotone cones. 

A major attraction of isotonic regression procedures is their simplicity. Since they are 
nonparametric and data driven (i.e., they do not require the tuning of any smoothing 
parameter), these estimators have raised considerable interest since more than fifty years. 
A comprehensive account on the subject can be found in [3], statistical properties have 
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been studied in [B], [7], [51], [U], and extensions or improvements to more general order 
of the PAVA approach can be found in [TB] , and [S] for example. 

Still in nonparametric statistics, but in a multidimensional context this time, the additive 
models were suggested by Friedman and Stuetzle [18] and popularized by Hastie and 
Tibshirani [21] as a way to get around the so-called "curse of dimensionality" . In brief, 
this means that, in multivariate smoothing, nonparametric estimators have to consider 
large neighborhoods of a particular point of the space to catch observations, and hence 
large biases can result. The additive model assumes that the regression function can be 
written as the sum of smooth terms in the covariates: 



Since each variable is represented separately in ([5]), the additive model provides a logical 
extension of the standard linear regression and once an additive model is fitted, one can 
easily interpret the role of each variable in predicting the response. 

Buja et al. [S] proposed the backfitting algorithm as a practical method for fitting additive 
models. It consists in iterated fitting of the partial residuals from earlier steps until 
convergence is reached. If the current estimates are ri,...,fd, then rj is updated by 
smoothing y — Ylkj^j'^k against . While backfitting has attracted much attention and 
is frequently applied, it has been somewhat difficult to analyze theoretically. Nonetheless, 
when using linear smoothers in each direction, the convergence of the algorithm can be 
related to the spectrum of the individual smoothing matrices (see, e.g., [5] and [32]), and 
when all the smoothers are orthogonal projections, the whole algorithm can be replaced 
by a global projection operator [20] . 

There exist other multivariate methods based on repeated fitting of the residuals. Some of 
them, like L2-boosting [9], boosted kernel regression [H], iterative bias reduction [TOl[TT| . 
do not assume any particular structure for the regression function. The common principle 
of these approaches is to start out with a biased smoother or a weak learner, and then 
to estimate and correct the bias in an iterative manner. Hence, instead of smoothing 
the partial residuals, one smoothes the global residuals y — f^, and then correct the 
previous smoother. Just as for the backfitting, the convergence of these algorithms as 
well as the statistical properties of these estimators have mainly been studied in the case 
of linear smoothers. 

In our situation, however, it is noteworthy that projections on convex cones are not linear 
operators. But considering our iterative estimator as the application of Von Neumann's 
algorithm (see for example [I3]), we will show that iterating the procedure tends to 
reproduce the data. Moreover, we manage to go further in the analysis by proving that 
the individual terms of the sum converge as well to identified limits. This is in fact 
possible thanks to a result which is rather unexpected from the statistical point of view: 
iterating isotonic regression in a backfitting fashion or in a boosting fashion yield the same 
estimators at each step. 
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Interestingly, this result stems from a property of the projections onto monotone cones 
which, in our case, reads as follows (recall that iso(y) is defined by (jl])): 



y{y, y) G M" X M", y-y eC+ ^ iso(y) - iso(y) G C+ (6) 

From a more general perspective, one can see this equation as a particular case of the 
property of isotonicity of the projection onto convex cones. Here isotonicity is considered 
with respect to the order induced by the cone. The idea to relate the ordering induced 
by a convex cone and the metric projection onto the convex cone goes back to the paper 
by Isac and Nemeth [23], where a convex cone in the Euclidean space which admits an 
isotone projection onto it (called by the authors isotone projection cone) was character- 
ized. Thereafter, this notion was considered in the complementarity theory to provide 
new existence results and iterative methods [251 EEl EI] . 

Yet, the notion of the cone in the above cited papers is used in the sense of "closed convex 
pointed cone". Confronted with the question if the monotone cones C~ and C"*", which 
are not pointed, admit or not isotonic metric projections, we shall develop in Section [2] a 
general theory in order to apply it to this special case. This seems to be the simplest way 
to tackle this problem. Therefore, Theorem [1] below is interesting by itself. By using this 
approach. Corollary [2] states that the monotone cones C~ and admit isotone metric 
projections. 

Then, we come back to the statistical framework, the remainder of the paper being orga- 
nized as follows: the definitions, the analysis and the equivalence of the Iterative Isotone 
Regression estimator and the Iterative Isotone Bias Reduction estimator are detailed in 
Section [31 As iterating these algorithms tends to reproduce the data, we then explain 
how the procedure might be stopped in practice (Section H]). Finally, most of the proofs 
are gathered in the Appendix. 

To conclude this introduction, we would like to make a few comments on the topics 
that will not be addressed in the present document. Starting from a sample P„ = 
{(Xi, Yi), . . . , (X„, Yn)} of i.i.d. random couples with the same distribution as a generic 
pair (X, y), our estimator takes the form f^"\ where denotes the number of iterations 
possibly depending on the sample size n. In this framework, an important aspect is 
to specify conditions on the regression model ([T]) and on the sequence {kn) so that, for 
example. 
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(r(X)-f(^")(X)) 



where the expectation E[.] is considered with respect to the sample P„, and the generic 
variable X. If this property is satisfied, we say that the regression function fn""^ is 
consistent. It is universally consistent if this property is true for all distributions of 
(X, y) with E[y^] < oo (see for example [IS]). This important issue will not be pursued 
further here and will be addressed elsewhere by the authors. 
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2 Isotonicity of the projection onto the monotone 



cones 

It turns out that the isotonicity of the projection, as defined in (jS]) for the specific case of 
the cone C~^, is in general a very strong requirement which imphes the latticiahty of the 
order induced by the convex cone. Thus, the investigation of the isotone projection cones 
becomes part of the theory of latticially ordered Euchdean and Hilbert spaces. A simple 
finite method of projection onto isotone projection cones proposed in [30] has become 
important in the effective handling of all the problems involving projection onto these 
cones. Besides nonlinear complementarity, isotone projection cones have applications in 
other domains of optimization theory. The method proposed in has become important 
in the effective handling of the problem of map-making from relative distance information 
e.g., stellar cartography, see Section 5.13.2.4 in [13] and 

www . convexoptimization. com/wikimization/index .php/Projection_on_Polyhedral_Cone 

Although we shall not consider projection methods here, we stress that some of the results 
developed in [30] will be useful in our proofs. 

Let us first introduce some notations. If C is a non-empty, closed convex set in MJ^, then 
for each x G M" there exists a unique nearest point Pqx G C, that is, a point with the 
property that 

||x — Pcx\\ = inf{||x — c|| : c G C}, 

where ||.{| stands for the Euclidean norm in [35]. The mapping Pc : M."- C is called 
the nearest point mapping of onto C or simply the (metric) projection onto C. Let 
C be a closed convex cone in M"-, i.e., a closed nonempty set with (i) C + C G C, and 
(ii) tC C C, V t G M+ = [0, +oo). If C n (-C) = {0}, then C is called a closed convex 
pointed cone. In order to lighten the writings, and since all sets at hand in the following 
will be closed and convex, we propose to call them respectively "cone" and "pointed cone" . 

Lemma 1 Suppose that C is a cone, denote L = Cn{—C) the maximal subspace contained 
in C, L-^ its orthogonal complement, and K = L"*- (IC. Then, K is a pointed cone in L-^, 

C = K®L (7) 

where © stands for the orthogonal sum, and 

PcX = PKXk + Xi (8) 

where x = x^ + xi with xi & L and G L-*-. 

By putting u <c v whenever m, u G M" and v — u E C, the cone C C M" induces a 
semi-order <c in which is translation invariant (i.e. u <c v implies u + z <c v + z for 
any z G M") and scale invariant (i.e. u <c v implies tu <c tv for any t G 

The projection Pc is said C-isotone if m, G M", u <c v implies Pqu <c Pcv. If Pc 
is C-isotone, then C is called an isotone projection cone. At this point, we would hke 
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to emphasize that in [21], the authors investigate isotone projection properties only for 
pointed cones. Our purpose in this section is thus to generahze their resuhs to cones 
which are not necessarily pointed, hence introducing of the decomposition C = K (B L in 
Lemma [H where K is a pointed cone. 

Theorem 1 Let C C M" 6e a cone, 

C = K®L 

with L = C n (— C) and K = C H . Then, C is an isotone projection cone if and only if 
the pointed cone K C L-^ is an isotone projection cone in L-^ . 

A simple geometric characterization of the isotone projection (pointed) cones was given 
in [21]. It uses the notion of the polar of a cone. If C C is a cone, then the set 

= {y G : (a;, y) < 0, V a; G C}, 

is called the polar of the cone C. The set C"*" is obviously a cone. If the cone C is generating 
in the sense that C — C = M*^, then the polar C"*" is a pointed cone. We have the following 
easily verifiable result: 

Lemma 2 Suppose that C is a generating cone. Using the notations introduced in the 
TheoremUl and denoting the polar of the pointed cone K in the subspace L-*- by K^, we 
have the relation 

= iK^, 

where i is the inclusion mapping of L-^ into M". 

Putting together the main result in [21] , Theorem [1] and Lemma [21 we have the following 
conclusion: 

Corollary 1 The generating cone C is an isotone projection cone if and only if its polar 
C"*" is a cone generated by linearly independent vectors forming mutually non-acute angles. 

Let us focus our attention on the monotone cone 

C~ = {beW : bi>b2> ... > bn} 

It is readily seen that C~ is a generating cone, but not a pointed cone. Specifically, let 

L = C~ n (-C") = {x G : Xi = X2 = ... = 

then L G C~ , the maximal subspace contained in C~ , is of dimension one. We have also 
that 

K = L^nc- (9) 

is an [n — l)-dimensional pointed cone in the hyperplane L-*- and 

C- =C~n {L^ ®L) = K®L. 

We will prove that the pointed cone K given by ([9]) is an isotone projection cone in L-*-, 
hence the following result. 
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Corollary 2 The monotone cones C and admit isotone projections. 

This result will be the basic ingredient to show the equality between the algorithms I.I.R. 
and I.I.B.R. presented in the next section (see Theorem [2]) . 

3 Iterative Isotone Regression 

In this section, we first present the algorithm called Iterative Isotonic Regression (I.I.R. 
in short) which proceeds in alternating isotonic and antitonic regressions in a backfitting 
fashion. The connection with Von Neumann's algorithm and recent results in this topic 
will allow us to exhibit the limit of this estimator when the number of iterations goes to 
infinity. 

Let us first briefly recall the statistical framework and the idea behind our algorithm. We 
consider the nonparametric regression model 

Y = r{X)+e (10) 

where the random variable X is, for example, uniform in [0, 1]. Classical hypothesis for 
studying the consistency of an estimator of r are E [Y"^] < oo and E = 0, but this 
will not play a prominent role in the following since we will not investigate the statistical 
properties of I.I.R. More important, we will assume that the regression function r is right- 
continuous and of bounded variation. Then, the Jordan decomposition asserts that such 
a function can be written as the sum of a non- decreasing and a non-increasing function: 
r{x) = u{x) + b{x). Specifically, if we impose that u and b have singular associated Stieltjes 
measures and, for example, that 

/ u{x) dx = / r{x)dx, (11) 
Jo Jo 

then the decomposition is unique. In this case, we adopt the following notation 

Vx G [0, 1] r(x) = u*{x) + fe*(x), (12) 
and we call this latter writing the Jordan minimum variation decomposition of r. 

3.1 Framework, notations and convergence 

In a statistical setting, we shall use a sample of i.i.d. random couples (Xi, Yi), . . . , (X„, Yn) 
following the model ffTOl) and try to estimate the regression function r. Viewing u* 
and h* as the components of an additive model involving two monotone terms, the 
idea of I.I.R. is to apply alternatively isotonic and antitonic regressions. Recall that 
Xi = < . . . < Xn = X(n) denotes the ordered sample, that yi, . . . ,y„ are the cor- 
responding observations and that iso{y) (resp. anti(|/)) is the isotonic (resp. antitonic) 
regression of y = (t/i, ?/„)'• 
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Besides, if z = {zi, . . . , Zn)' , the notation A{z) stands for the (n — 1) dimensional vector 
defined as 

A(^) = (Z2 - Zi, . . . , Zn - Zn-l)' ■ 

Considering two vectors z and z, we may write A{z) o A{z) for their term-by-term (or 
Hadamard) product, which means 

A{z) O A{z) = {{Z2 - Zi) X {Z2 - Zi), ...,{Zn- 2„-l) X (z„ - Zn-l))' ■ 

If A{z) o A{z) = 0, we will say that z and z have singular variations. 

We are now in a position to specify the Iterative Isotonic Regression algorithm. 



Algorithm 1 Iterative Isotonic Regression (I.I.R.) 

(1) Initialization: 

6(°)=(6f\...,6i°))' = 

(2) Cycle: for A; > 1 

U^^ = anti [y - u^^'^) (13) 

(3) Iterate (2) until a stopping condition to be specified. 



We prove in the Appendix (see Section [575]) that at each iteration k >1, 

( A (n('=)) o A (^feW) = 

< ¥'^) = y (14) 
[ E« = 0, 

where y := (XlILi ^Z*)/""- stands for the empirical mean of y. These equations ensure the 
identifiability of the decomposition y^''^ = vS-^'^ + h^^^ . One might indeed consider them as 
the translation of conditions fllip in a discrete context. In other words, given y^'^\ both 
vectors iS^^ and fe*-^'^ are uniquely specified. 

Figure [U illustrates the application of the I.I.R. algorithm on the example of the fonction 
r which is drawn on the top left hand side of the figure. Still on the top left hand side 
of the figure, we have also plotted n = 100 points {xi,yi) to obtain our sample: for each 
point, one has yi = r{xi) + Ei, where is a Gaussian centered random variable. Besides 
these sample points, the three other figures show the estimations y^''^ with k = 1, 10, 1000 
iterations. 
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Figure 1: Application of the I.I.R. algorithm for k = 1, 10, 1000. 

One can see that for each iteration the method fits a piecewise constant function to the 
data and that increasing the number of iterations leads to increase the number of level 
sets (clusters). It also appears on this example that iterating the algorithm leads to the 
interpolation of the original data. This property is always true, as established by the 
following result. 

Proposition 1 With the previous notations, one has 

lim y^'^^ = y. 

fc— >-oo 

Proof. In the following, y is held fixed and, as usual, y + stands for the translated 
cone 

+ = {y + G C+}. 

All the notations are recalled on figure [2j To understand the geometrical forces driving 
Proposition [H this figure also provides a very simple interpretation of the algorithm, as 
it illustrates that the sequences of vectors ■u'-'^^ and y — M'^^ might be seen as alternate 
projections on the cones and y + . In what follows, we justify this illuminating 
geometric interpretation in a rigorous way, and we explain its key role in the proof of the 
convergence as k goes to infinity. 

First, by definition, we have u^^^ = Pc+{y), and a moment's thought shows that 

P,+c+(^^'^) =y + PcAu^''> -y)=y-Pc-{y- ^^'^ 
Then, coming back to the very definition of b^^^ = Pc-{y — m*-^^) leads to 
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In the same manner, since ■uP''^ = Pc+{y — b^^^), we get 

y _ 5(2) =y_p^_^y_ ^(2)) ^ y + p^^^y _ ^(2)) ^ P^_^,^(^(2)) 

More generally, denoting = 0, this yields for a\\ k > 1 



u 



(^) = Pc+ {y - 6(^-1)) and = Py+c+ (u^^^) 



.6(2) 








\ 

'^ 








/ lU' 


















^^^^^^ fc'l) 




\ \ 



Figure 2: Interpretation of the I.I.R. method as a Von Neumann's algorithm. 
It remains to invoke Theorem 4.8 in [1] to conclude that 

k—>oo 

which ends the proof of Proposition [H □ 
A few remarks are in order: 

1. The take-home message here is that iterating the algorithm leads to overfitting, 
which is of course not desired in practice. Consequently, a stopping criterion must 
be applied in order to avoid this phenomenon. In this aim, we have tested several 
pratical rules, and this will be the object of Section HI 

2. Note that Proposition [1] ensures the convergence of the sum u^'^'' +6^^) but does not 
say anything about the convergence of its individual terms u^'^^ and b^''\ However, 
Corollary 4.9 in |1] asserts that the sequences ii^''^ and M^-* are also convergent. 
The specification of these limits will be possible in our situation thanks to the 
introduction of another equivalent algorithm, called I.I.B.R. This will be the topic 
of the next subsection. 
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3. In [2S], Mammen and Yu consider isotonic functions in the multivariate additive 
model They rely on the analysis of Dykstra's algorithm sequences [T7] to show 
that for fixed n the backfitting procedure converges to the solution of 

n 

argmin^ (l/j - ujf 
j=i 

where u is the sum of d vectors in C~^. Hence, our case is rather comparable to the 
one considered by these authors when d = 2. Correspondingly, in our setting, one 
can see the vectors y — feC^"^) — -uC^) and y — u^''^ — b^'^^ as Dykstra's sequences and 
consider Mammen and Yu's alternate projections on the polar cones of C"*" and C~ to 
prove Proposition[TJ However, as we are alternating projections onto opposite cones, 
considering f^^^ and y — h^^'^ yields the easier Von Neumann's type interpretation 
given in our proof. 

3.2 The connection with Iterative Isotonic Bias Reduction 

In this section we propose another algorithm inspired by bias reduction techniques in 
regression [9] [10]. In a nutshell, the idea here is to work on the updated residuals y — y^^^ 
in order to refine the estimator at each step. Specifically, this algorithm takes the following 
form. 



Algorithm 2 Iterative Isotonic Bias Reduction (I.I.B.R.) 

(1) Initialization: 

= 

(2) Cycle: for A; > 1 

u^^) = iso [y - y^''^^^) 
hi^) = anti [y — y^^~^^ — 

Updating: 

(3) Iterate (2) until a stopping condition to be specified. 



Interestingly, it turns out that algorithms I.I.R. and I.I.B.R. coincide. This remarkable 
fact, which is the purpose of the next theorem, deeply relies on the property of isotonicity 
of the projection onto the cones we consider. As a by-product, it will allow us to make 
precise the individual limits of the sequences viP''^ and 
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Theorem 2 With the previous notations, for all k > 1, 



u 



E 



and 



6^ = ^60). 



Proposition [T] ensures that y^'^^ tends to y when k goes to infinity. Thanks to Theorem [2] 
we will go one step further and show in Corollary [3] that the individual terms ■u'-'^^ and M'^^ 
tend to some explicit limits u* and b*. These limits are simply the discrete analogous of 
the Jordan minimum variance decomposition of a function with bounded variation f|T2|l 



for the vector y. As will be stated below, they are indeed characterized by y = u* + b* 
and 

AK)°A(6;) = 0, 
the empirical means of y and u* being the same. 

Before proceeding, let us illustrate this on the example of figure [31 The left-hand side 
represents the vector y and the decomposition y = u* + b*. One can see that 



yi+i -yi>0 



u 



y,i+l 



U 



- yi and bl-^^ - bl ^ = 



and conversely. 



yi+i -yi<0 



and fe* - bl. 



Vi+i - Vi, 



which, in the discrete case, amounts to say that u* and b* have singular associated Stieltjes 
measures in equation (|T2|) . The right-hand side of the figure illustrates that when the 
number of iterations k goes to infinity, m^'^'^ and M'^'^ converge respectively to u* and b*. 





0.0 0.2 0.4 0.6 0.8 1.0 



Figure 3: Respective convergences of u^^^ and b^''^ to u* and b*. 



12 



Corollary 3 The sequences (^'''^^)^.>]^ one? \^U^^j admit the following limits 



lim u^'') = u* and lim U'^'^ = bl 

k^oo " fc— >oo ^ 

where u* and b* are such that y = u* + b*, u* = y, b* = and A{y) = A(n*) o A{b*). 

Proof. In order to lighten the notations a bit, let us assume that the empirical mean of 
y is equal to zero. Consequently, all the intermediates of both algorithms have also zero 
mean, as well as u* and b*. This allows us to work in the hyperplane S, that means the 
subspace of consisting in all zero mean vectors. In this hyperplane, we introduce the 
norm V to quantify the variations of a vector: 

V: ^ ^ M+ 

y ^ Y{y) = ^'.'^^ ly.^i - y.\ 

Then, it is routine to check that for any vector z & S and any decomposition z = + bz 
with Uz and b^ both in we have the following equivalence 

A{uz) o A{bz) = ^ V{z) = V{uz) + V{bz). (15) 

Now, Corollary H] in Section [5.51 ensures that for all > 1, ii^''^ and U^^ have singular 
variations (one can also see that on figure [3]) which implies that for all /c > 1 

V(yW) = V(m('=))+V(&(^)). 

Accordingly, we just have to justify that the limits := limfc^oo u^''"' and 6°° := limfc_i,oo ^'•'^^ 
satisfy V(u°°) + V(6°°) = V{y) to deduce that A(u°°) o A(6°^) = 0, hence = and 
b°° = b*. The existence of the limits implies that for alH G {1, . . . , n}, 

u°° := hm ^ and 6°° := lim fcf ^ 

are well-defined. Then, the continuity of the norm V leads to 

lim V(u('=)) = V(n°°) and lim Y{b^^^) = V(6°°), 

fc— ^-oo fc— S-oo 

and the same relation holds for y'-'^'^ and y. Thus, 

V{y) =v(lim,^oo{w(') + 

= limfc^ooV(MW + 6(^-)) 

= limfc^oo{v(n«)+V(6('=))} 

= limfc^oo V(n(^)) + lim^^o, V(6(^')) 

= V (limfc^oo u^"^) + V (linifc^oo feC^)) 

= V(u~) + V(6°°). 

From equivalence f[T^ . we deduce that = u* and b* = b°°, where y = u* + b* is the 
Jordan minimum variance decomposition of y. 

□ 
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4 Discussion 



As increasing the number of iterations leads to overfit the data, iterating the procedure 
until convergence is not desirable. This brings up to the choice of a stopping rule which 
could be used in practice. Viewing the latter question as a model selection issue suggests 
stopping criteria based for example on Akaike Information Criterion (AIC, see [T]), mod- 
ified AIC criterion (AICc, see [23]), Bayesian Information Criterion (BIC, see [53]) and 
Generalized Cross Validation (GCV, see [T2]). 

For a linear smoother f\ = S\{y), with A the smoothing parameter and Sx the smoothing 
matrix, these selectors can be written in the common form 



where RSS denotes the residual sum of squares and </>(.) is an increasing function of the 
number of degrees of freedom px of the smoother. One usually takes px = tr(S'A) or 
px = tr(S'AS'^) (see [8|, section 2.7.3) and according to the various criteria mentioned 
above 



Thus, equation f ll6p leads to a tradeoff between the goodness of fit and a penalization of 
high model complexity. Although isotonic regression is not a linear smoother, we refer 
to Meyer and Woodroofe [29] to consider that the number of distinct values among the 
fitted vector provides the effective dimension of the model. Taking this into account and 
considering that increasing the number of iterations tends to reduce the residual sum of 
squares but raises the complexity of the model, a natural extension for iterative isotonic 
regression is to replace px by the number of sets of the fitted vector f^''^ in ( IT6|) and solve 



over a grid JC of iterations. 

We have applied and compared these stopping rules for iterative isotonic regression 
through simulated data. It appears that AICc shows the best performances among the 
three other criteria for most investigated cases. Then, for this specific stopping criterion, 
we have compared iterative isotonic regression with non parametric competitors, namely 
local polynomial regression (R package locpol), and smoothing spline regression (R pack- 
age stats, function smooth. spline). For further details on this topic, we refer the interested 
reader to the R package dedicated to I.I.R. at the following address: 

www . sites . uiiiv-rennes2 . f r/laboratoire-statistique/ JEGOU/ index . html 




(16) 




(17) 




(18) 
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The take-home message is that I.I.R. can not compete with sphne or local polynomial 
regression for smooth functions. However, when the functions have discontinuities, our 
estimator compares favorably with his competitors, in particular when the sample size 
increases. This suggests that our method could be used to locate discontinuities in a 
regression framework. Applications arise for example in genomic where the detection of 
breakpoints from Array Comparative Genomic Hybridization (array CGH) profiles is of 
crucial importance to identify genes involved in cancer progression [22]. 

5 Appendix 

5.1 Proof of Lemma [1] 

The relation ([7]) follows directly from 

C = Cn{L^ ®L). 

It is known (see [35]) that the projection Pqx of x onto the cone C is characterized by the 
couple of relations: 

{x-Pcx,y)<0, \/yeC, (19) 

and 

{x - Pcx, Pcx) = 0. (20) 

Hence, we have to verify the above relations for P^x^ + xi instead of Pcx. By the relation 
([7]), PxXk + xi & C. Then, take an arbitrary y ^ C represented by ([7]) in the form 

y = yk + yi 

with yk G K and yi G L. Then, we have 

{xk + xi- (Pft-Xfc + xi),yk + yi) = {xk - PxXk, yk) <0, y = yk + yi ^ C, 

because yi is perpendicular to Xk — Px^k G L-^, and because of the relation similar to f[T^ 
characterizing the projection of Xk onto the pointed cone K in L-*-. Thus, relation (I19p 
holds for PxXk + xi in place of Pcx. We further have 

{Xk + Xi- {PxXk + Xi), PxXk + Xi) = {Xk- PxXk, PxXk) = 

because xi is perpendicular to Xk — PxXk and because of the relation similar to fl20ll applied 
to Xk G L-^ and its projection onto K. The obtained relation is exactly fl20l) for PxXk + xi 
instead of Pqx. 
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5.2 Proof of Theorem [T] 

Take u, v & L-^. Then, u <k v is equivalent to u <c v. If Pc is C-isotone, then u <c v 
imphes by Lemma [1] 

Pkv - Pku = Pcv - Pcu G C. 
Since Pru, Prv G L-*-, it follows that 

Pkv - Pku eL^nC = K. 

The obtained relation shows that P^ is /T-isotone, concluding the proof of the necessity 
of the theorem. 

Suppose now that Pk is i^'-isotone and take u, G M" with u <c v. U u = Uk + ui and 
V = Vk + vi with Uk, Vk G L-^, and Ui, Vi G L, then using formula ([7]) 

V — u = Vk — Uk + vi — ui E K Q) L 

and hence Vk — Uk E K, that is Uk <k Vk and by the iC-isotonicity of Pk it follows that 

PKVk - PxUk G K. 

Hence, using formula ([8]) we have 

Pqv - Pcu = PxVk + vi - PxUk -ui = PxVk - PRUk + vi - ui e K ® L = C. 
That is Pqu <c Pcv, which concludes the isotonicity of Pc- 

5.3 Proof of Corollary [2] 

It clearly suffices to prove that the monotone cone C~ = {6 G : bi > b2 > ... > bn} 
admits an isotone projection. For this, we have to introduce some notations. Let us first 
take the following base in M": 

ei = (l,0,...,0),e2 = (l,l,0,...,0),...,e„_i = (1, 1, 0), e„ = (1,1,...,!). 

Then, an arbitrary element b = {bi, bn) G can be represented in the form 

b= {bi- 62)61 + (62 - 63)62 + ••• + (6n-l - bn)en-l + bnCn, (21) 

the relation b E being equivalent with 

6,_i-6, >0, j = 2,...,n. (22) 
Let us consider further the following base in L-^: 

e[ = in-l, -1, -1, -1), = (n - 2, n - 2, -2, -2), . . . , = (1, 1, "(^ - !))• 
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The following notation is standard in the convex geometry and ordered vector space 
theory: If M C MJ^ is a non-empty set, then let 

cone(M) = {tini + ... + tkUk : Ui e M, U eR+ = [0, +00), i = 1, k; k e N}. 

The set cone(M) is the minimal cone containing the set M and it is called the cone 
generated by M. Denoting 

L = C-n {-C-) = {be : 61 = 62 = ... = bn}, 
and K = n C~ , we will see next that 

i^ = cone({e;,...,e;_i}). (23) 
Since e'j G C~ fl L-*-, we have obviously that 

cone({e;,...,eU})ci^. (24) 
Comparing the vectors Cj and e'j we get 

^— (e'. + e„) = Cj, j = l,...,n-l. (25) 

n — J + 1 

By substitution of Cj, j = 1, n — 1, the representation (!?!]) of b becomes 

11 1 

6 = (61 - 62) - {e[ - Cn) + (&2 - ^3) (e's + e„) + ... + (6„„i - 6„) - (e^.i + e„) + 6„e„ (26) 

Suppose now that b E C~, that is, relations ( 122|) hold. Then, the coefficients of e^-, j = 
1, ...,n — 1 in its representation (126|) are non- negative. Thus, we have 

6 G ^ 6 = ^ tje^. + tnCn, tj e M+, j = 1, n - 1, t„ G M. (27) 

i=i 

In particular, if 6 G -fC, then, by ([9]), we have 6 G L-^. Hence, by multiplying fl27|) scalarly 
by e*^ and by using (6, e„) = (which follows from b E and e„ G L) and (e^-, e„) = 
(which follows from e'j G L"*- and e„ G L), we get t„ = 0. This reasoning shows that 

K C cone({e'i,...,e^_J), 

inclusion which together with 0241) proves (125]) . We consider now the vectors 

m = (-l,l,0,...,0),n2 = (0,-1, 1,0,..., 0),...,u„_i = (0, ...,0,-1,1). 
Then, Ui G L"*", i = 1, n — 1, and we have 

(M,,e;.) = 0if27^j, (M„e:)<0, ^,j = l,...,n-l. (28) 
According to the reasonings in [30] the relations fl28l) show that 

cone({Mi, ...,M„_i}) 
is the polar of K in the subspace L-*-. Further, we have 

{ui, Uj) < if i 7^ j. 

By the main result in [21] this shows that K is an isotone projection pointed cone in L-*-. 
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5.4 Proof of Theorem [2] 

First we prove that for all /c > 1, 

^(fc+i)_^WeC+ and 6(^^+1) -^WgC". (29) 

For A; = 1, we have 

M(2)-^« = iso(y-6W)-iso(y). 

Since —b^^^ G and as is an isotone projection cone by Corollary [2], we deduce that 
{^(2) _ {^(1) belongs to C^. In the same manner, 

6(2) _ 5(1) = anti(?/ - - anti(?/ - u^^^), 

which is equivalent to 

6(2) - 6(1) = anti {y - u^''^ - {u^^^ - n^^))) - anti(2/ - u^'^). 

Now, as was just noticed, — (u(2) — u(i)) belongs to C~ , which is also an isotone projection 
cone, so that M^) — M^) is indeed in C~~. We can iterate this reasoning. For example, at 
the next step: 

^(3) _ ^(2) ^ -^^^y _ _ -^^^y _ ^ (^y _ ^(l) _ ^^(2) _ ^ ( l ) ^ _ ^^^^^ _ p)^^ 

and we may go on with the same arguments as before. 

Next, we prove the desired result by induction on k. At the first step {k = 1), both 
algorithms clearly coincide. Now let us assume that it is still true at step k > 1, which 
means 

^ik) ^ ^(k) ^ p) ^i^h ^(k) ^ J2 w(^) and U'^ = J2 b^'^- 

i=i i=i 
Our objective is to prove that {i('^+^) = Yl'j=i this, let us show that 

Wy^bW = ||^_^(fc)_^(fe+l)||. (30) 

Due to the fact that ■u(^'+^) is the best non-decreasing approximation of y — 6^^), and since 
uik) + itself 

is a non- decreasing sequence, we deduce that 

lly _ 6(^0 _ ^(fc+l)|| < _ 6^=) - (uC^) + u(^+^))\\ = \\y - yi^) - ^(^-+1)11. 

Now, as ■u('^'+^) is the best non- decreasing approximation oi y — y^'^\ it is in particular a 
better approximation than -uC^+i) — -uC^), this latter being itself non- decreasing as was just 
established above (see (|2^ ). Thus, we are led to 

||y_^(fc)_^(^-+l)|| < ||^_^(^-)_(^(fc+l)_^(fc))|| = \\y_p) _^(k+l)l 
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Putting all the pieces together, f l5U]) is proved, and we get 



This indicates that et u^'^~^^^ — ■u(^) both realize the minimal distance to y — y^'^\ 

As a consequence. 



and finally 



(fc+ 



The same arguments may be repeated to establish that U^^^^ = X]^=i^''"'^- Details are 
omitted. 



5.5 Proof of identifiability conditions 

The purpose of this section is to prove the identifiability conditions (1141) for the I.I.R. 
algorithm, that is, for all A; > 1 (see also figure H]) 

A (n^'^)) o A = 

6^ = 0, 




Figure 4: Illustration of identifiability conditions. 

The two last equations rely on a characterization of metric projections which was already 
mentioned in the proof of Lemma [H From equations (1191) and fl2Ul) . we know that for any 
y G M", the vector u G C"*" (resp. C~) is the isotonic (resp. antitonic) regression of y if 
and only if 

{y-u,u) = 
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and 



Wv e C+(resp. C"), {y -u,v) < 0. 



Taking successively v = {l/n, . . . , 1/n)' G C'^ fl C and v = (— 1/n, 
in fl3T]) leads to 



(31) 

4/n)'GC+nC- 



^ n 1 " 

- iso(y)i = - anti(i/)i = 

77, n ^-^ 

1=1 1=1 



which proves that for al\k > 1, u^^^ = y andfeC^) = 0. To prove that A ('u'^'^^)oA yU^^j = 0, 
we will need the following result. 

Proposition 2 Let ?/ G M", then 

u = iso{y) and b = anti(?/ — u) ^ ^(^) ° ^(^) = (32) 

and 

b = anti(?/) and u = iso(y — b) ^ ^(^) ° ^(^) = 0. (33) 
Proof. Let us prove for example that 

Ui < Ui+i =^ hi = bi+i. 

We first establish that after an isotone regression, the last observation yi of a cluster Ai is 
always lower than or equal to Ui. For this, we use a proof by contradiction, arguing that 
a situation like the one depicted in figure [5] is impossible. 




Figure 5: Notations for the clusters and Aj+i. 

Let us denote Ai the cluster with last element y^. For the sake of simplicity, Ui stands for 
the last common value to all elements of cluster Ai after the isotone regression. From the 



20 



properties of the Pool Adjacent Violators Algorithm (see for example [3]), it is well-known 
that Ui satisfies the following equation 



where \Ai\ is the cardinal of Ai. Equivalently, the next cluster is denoted Aj+i and the 
corresponding isotonic value Wj+i. 

Now, let us assume that Ui < yi, and denote 



so that, clearly, Cj_i < Ui. Besides, it is readily seen that Ui < Wi+i, else Ui would belong 
to cluster Aj+i. Since Ci_i is the average of the y/s belonging to Ai — {ui}, the following 
inequality holds 

SO that 

Y ~ + (y^ ~ y^")^ + Yl ^y^ ~ 

< Y ^y^ " + ~ + 5Z ~ 
< Y'<y^ " + XI ^y^ " Ui+\f- 

This latter inequality indicates that the isotone regression with the values {cj_i, Uj+i} 
would be better than the original one with which is in contradiction with the 

very definition of the isotone regression, therefore <Ui. 

We prove in the same way that: 
Hence, we obtain 

Vi = yi - Ui < and r^+i = yi+i - Ui+i > 
so that Tj < Tj+i, and bi = which is the desired result. □ 

This result enables us to prove the so-called identifiability conditions for algorithms I.I.R. 
and I.I.B.R. 
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Corollary 4 For all k > 1, 

A(n(^))oA(6(^)) =0. (34) 

and 

A(m(*=))oA(6(^)) =0. (35) 

Proof. Equation flMl) follows immediately from Proposition [2l On the other hand, 
equation fl35|) requires more attention. We prove this by induction. This is obviously true 
for = 1 by Proposition [21 Let us fix /c > 1, and assume that 

Since = m^^^ + and b^^^^^ = + we get 

A(m('=+^)) o A(6('=+i)) = A(m('=)) o A(6('=+i)) + A(m('=+^)) o A(6(^)), (36) 

and our objective is to prove that both terms on the right-hand side of this equation are 
equal to zero. First notice that 

= anti(2/ - y^^^ - u^^+^^) = anti(y - U^^ - u^^+^^) 

and by definition of u^'^^'^\ 

^(^+1) = iso(y-6(^)). 

Then, Proposition [2] gives 

A(n('=+^)) o A(6(^'+i)) = 

so that 

{A(iz('=)) + A(m(^+i))} o A(6('=+i)) = 

and finally 

A(m('=)) o A(6(^-+i)) = 0. 

Let us now turn to the second term on the right-hand side of equation fl36l) . Just note 
that 

= iso(y - y^^^) = iso(y - u^"') - U^^) 

where 

U^^ =anti(y-{i('=)). 
Invoking Proposition [2] again, we are led to 

A{u^^+^^) o A{U^)) = 
and the right-hand side of equation fl32|) is equal to zero. □ 

Remark: One can notice that the proof of Corollary H] uses the fact that algorithms I.I.R. 
and I.I.B.R. coincide (Theorem [2]), but of course the proof of this theorem did not make 
any use of Corollary HI A quick inspection shows that this last result is applied only in 
the demonstration of Corollary [31 
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